Module 02: Data Visualisation
School of Mathematical and Physical Sciences
ggplot2Data Visualisation with ggplot2
R: base version@allison_horstggplot2.ggplot2 of Wickham et al. (2022) is one of the most elegant and most versatile.ggplot2 implements the grammar of graphics, a coherent system for describing and building graphs.ggplot2, you can do more faster by learning one system and applying it in many places.Note
ggplot2, you begin a plot with the function ggplot().
ggplot() creates a coordinate system that you can add layers to.ggplot() is the dataset to use in the graph.ggplot(data = df) creates an empty graph, so I’m not going to show it here.ggplot().
geom_dotplot() adds a layer of points to your plot, which creates a dot plot.geom functions that each add a different type of layer to a plot.geom function in ggplot2 takes a mapping argument.
mapping argument is always paired with aes(), and the x and y arguments of aes() specify which variables to map to the x and y axes.ggplot2 looks for the mapped variables in the data argument, in this case.ggplot2.penguins data is from the palmerpenguins package of Horst, Hill, and Gorman (2022).# pak::pak("palmerpenguins") # if not installed or
# install.packages("palmerpenguins")
library(tidyverse)
library(palmerpenguins)
glimpse(penguins)# Rows: 344
# Columns: 8
# $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
# $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
# $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
# $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
# $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
# $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
# $ sex <fct> male, female, female, NA, female, male, female, male…
# $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
Emi TanakaggplotEmi Tanakadata =, mapping =, x =, and y = each time in ggplot.ggplot code in the wild often omit these argument names.ggplotFor example:
geomgeom:| geom | Title |
|---|---|
| geom_abline, geom_hline, geom_vline | Reference lines: horizontal, vertical, and diagonal |
| geom_bar, geom_col | Bar charts |
| geom_boxplot | A box and whiskers plot (in the style of Tukey) |
| geom_density | Smoothed density estimates |
| geom_dotplot | Dot plot |
| geom_freqpoly, geom_histogram | Histograms and frequency polygons |
| geom_jitter | Jittered points |
| geom_path, geom_line, geom_step | Connect observations |
| geom_point | Points |
| geom_qq_line, geom_qq | A quantile-quantile plot |
| geom_smooth | Smoothed conditional means |
| geom_label, geom_text | Text |
| geom_violin | Violin plot |
Emi Tanakaggplot by default.Emi Tanakagroup in ggplotgroup in ggplot (cont.)02:00
02:00
02:00
02:00
02:00
02:00
patchwork package@allison_horstpatchwork package of Pedersen (2022) allows us to combine and arrange multiple plots on the same graph.ggsave().FAFSA = Free Application for Federal Student Aid.FAFSA = Free Application for Federal Student Aid.module02_exercises.html.10:00
Combining the skills learnt so far
# create coordinates for labels
series_labels <- dat1 %>%
group_by(series) %>%
summarize(
y_position = median(viewers_7day) + 1,
x_position = mean(episode_count)
)
# make the plot
ggplot(dat1, aes(x = episode_count, y = viewers_7day, fill = series)) +
geom_col(alpha = .9) +
ggtitle("Series 8 was a Big Setback in Viewers",
subtitle = "7-Day Viewers across All Series/Episodes"
) +
geom_text(data = series_labels, aes(
label = series,
x = x_position,
y = y_position
)) +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()
) +
scale_x_discrete(expand = c(0, 0))line_labels <- ratings %>%
group_by(series) %>%
mutate(episode = as.numeric(episode)) %>%
slice_tail(n = 1) %>%
select(series, x_position = episode, y_position = viewers_7day)
ggplot(ratings, aes(
x = as.numeric(episode),
y = viewers_7day,
colour = series,
group = series
)) +
geom_line() +
labs(colour = "Series", x = "Episode") +
geom_text(data = line_labels, aes(
label = series,
x = x_position + .25,
y = y_position
))slope_labels <- dat2 %>%
filter(episode == "last") %>%
select(series, x_position = episode, y_position = viewers_7day)
ggplot(dat2,
aes(
x = episode,
y = viewers_7day,
colour = series,
group = series
)
) +
geom_point() +
geom_line() +
geom_text(
data = slope_labels, aes(
label = series,
x = x_position,
y = y_position
),
nudge_x = .1
) +
theme(
panel.grid = element_blank(),
axis.line = element_line(colour = "gray")
)# plot
ggplot(dat3, aes(
x = fct_rev(series),
y = finale_bump
)) +
geom_col(alpha = .7) +
coord_flip() +
labs(x = "Series", y = "Difference in Viewers for Finale from Premiere (millions)") +
ggtitle("Finale 'Bumps' were Smallest for Series 10",
subtitle = "Finale 7-day Viewers Relative to Premiere"
)ggplot2School of Mathematical and Physical Sciences